Greedy bi-criteria approximations for k-medians and k-means

نویسندگان

  • Daniel J. Hsu
  • Matus Telgarsky
چکیده

This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of k-medians and k-means, the key results are as follows. • When the method considers all data points as candidate centers, then selecting O(k log(1/ε)) centers achieves cost at most 2 + ε times the optimal cost with k centers. • Alternatively, the same guarantees hold if each round samples O(k/ε) candidate centers proportionally to their cluster cost (as with kmeans++, but holding centers fixed). • In the case of k-means, considering an augmented set of nd1/εe candidate centers gives 1 + ε approximation withO(k log(1/ε)) centers, the entire algorithm takingO(dk log(1/ε)n1+d1/εe) time, where n is the number of data points in R. • In the case of Euclidean k-medians, generating a candidate set via nO(1/ε2) executions of stochastic gradient descent with adaptively determined constraint sets will once again give approximation 1 + ε with O(k log(1/ε)) centers in dk log(1/ε)nO(1/ε2) time. Ancillary results include: guarantees for cluster costs based on powers of metrics; a brief, favorable empirical evaluation against kmeans++; data-dependent bounds allowing 1 + ε in the first two bullets above, for example with k-medians over finite metric spaces. ar X iv :1 60 7. 06 20 3v 1 [ cs .D S] 2 1 Ju l 2 01 6

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid DEA-based K-means and invasive weed optimization for facility location problem

In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...

متن کامل

Greedy Approximation Algorithms for K-Medians by Randomized Rounding

We give an improved approximation algorithm for the general kmedians problem. Given any > 0, the algorithm nds a solution of total distance at most D(1 + ) using at most k ln(n + n= ) medians (a.k.a. sites), provided some solution of total distance D using k medians exists. This improves over the best previous bound (w.r.t. the number of medians) by a factor of (1= ) provided 1= = n. The algori...

متن کامل

Greedy Minimization of Weakly Supermodular Set Functions

This paper defines weak-α-supermodularity for set functions. It shows that minimizing such functions under cardinality constrains is a common task in machine learning and data mining. Moreover, any problem whose objective function exhibits this property benefits from a greedy extension phase. Explicitly, let S∗ be the optimal set of cardinality k that minimizes f and let S0 be an initial soluti...

متن کامل

A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++

This paper studies the k-means++ algorithm for clustering as well as the class ofD sampling algorithms to which k-means++ belongs. It is shown that for any constant factor β > 1, selecting βk cluster centers by D sampling yields a constant-factor approximation to the optimal clustering with k centers, in expectation and without conditions on the dataset. This result extends the previously known...

متن کامل

Sub-optimality Approximations

The sub-optimality approximation problem considers an optimization problem O, its optimal solution σ∗, and a variable x with domain {d1, . . . , dm} and returns approximations to O[x← d1], . . . ,O[x← dm], where O[x← d1] denotes the problem O with x assigned to di. The sub-optimality approximation problem is at the core of online stochastic optimization algorithms and it can also be used for so...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1607.06203  شماره 

صفحات  -

تاریخ انتشار 2016